-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): Implement naive join ordering #3616
feat(optimizer): Implement naive join ordering #3616
Conversation
CodSpeed Performance ReportMerging #3616 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3616 +/- ##
==========================================
+ Coverage 77.69% 77.93% +0.23%
==========================================
Files 710 720 +10
Lines 86896 88728 +1832
==========================================
+ Hits 67513 69149 +1636
- Misses 19383 19579 +196
|
#[derive(Clone, Debug)] | ||
pub(super) enum JoinOrderTree { | ||
Relation(usize), // (id). | ||
Join(Box<JoinOrderTree>, Box<JoinOrderTree>, Vec<usize>), // (subtree, subtree, nodes involved). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the nodes involved, you can implement Iterator to give you the nodes involved.
impl Iterator for JoinOrderTree {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I opted for JoinOrderTree.iter()
-> JoinOrderTreeIterator
which then implements Iterator
by maintaining a stack of Joins/Relations.
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/naive_join_order.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/naive_join_order.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Show resolved
Hide resolved
Missed a comment to #3616 (comment): ``` In [src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs](#3616 (comment)): > }; -#[derive(Debug)] -struct JoinNode { +// TODO(desmond): In the future these trees should keep track of current cost estimates. +#[derive(Clone, Debug)] +pub(super) enum JoinOrderTree { + Relation(usize), // (id). + Join(Box<JoinOrderTree>, Box<JoinOrderTree>, Vec<usize>), // (subtree, subtree, nodes involved). I dont think you need to keep an explicit stack. I think you should be able to something like std::iter::chain(left.into_iter(), right.into_iter()) ``` This PR removes the stack and simply chains the iterators.
Applies the naive left deep join order from #3616 as an optimizer rule. This optimizer rule is gated behind an environment variable that allows us to validate the rule on our current workloads. Currently join reordering results in errors for 50% of TPC-H queries during join graph building. We'll tackle these in a follow-up PR.
Implements a naive join orderer that simply takes joins relations arbitrarily (as long as a valid join condition exists).
This is intended as a building block to ensure that our join graphs can correctly reconstruct into logical plans. The PR that will immediately follow this will create an optimization rule that applies naive join ordering. The optimization rule will be hidden behind a config flag, but will allow us to test logical plan reconstruction on all our integration tests.